knitr::opts_chunk$set(echo=TRUE, warning=FALSE,
message=FALSE, eval=TRUE, cache=TRUE,
fig.width=20, fig.height=20)
library(ncdf4)
## Helper function
windowsmooth <- function(x, n){
## x = rnorm(100)
## n=10
ii = seq.int(from=0, to=length(x)-n, length=n)
jj = ii + n
inds = Map(function(a,b)(a+1):b, ii, jj)
sapply(inds, function(ind)mean(x[ind]))
}
(Heavily borrowed from the Paul’s Bayesian workshop slides on matrix population modeling here slides
In cell-level (cytometry) data collected over time from the ocean, one interesting derived dataset is the empirical distribution of biomass (or counts) over time. Using scientific intuition about the cells’ interaction with the environment, we can model the evolution of the empirical biomass distribution over time. This lends valuable empirical evidence of our theoretical understanding of cells’ life cycles in the ocean.
We describe a matrix population model.
First, discretize the empirical biomass distribution into \(m\) sizes.
The \(i\)’th size class has a diameter between \(v_i\) and \(v_{i+1}\), which are defined as:
\[ v_{i}=v_{\min } 2^{(i-1) \Delta_{v}} \text { for } i=1, \ldots, m+1.\]
The empirical distribution at time \(t\) is:
\[w(t) \in \mathbb{R}^{m},\]
whose \(i\)’th entry (\(i=1,\cdots,m\)) is the normalized counts (or amount of biomass) in the \(i\)’th size class; normalization is done so that \(\sum_{i=1}^{n} w_{i}(t)=1\).
The main mechanism for evolution over time is through the time propagation matrix:
\[A\left(t, \Delta_{t}\right) \in \mathbb{R}^{m \times m} \quad\]
defined as the transition of the distribution from time \(t\) to a later time \(t+\Delta_t\). This matrix will directly encode three basic type of cell activities.
Growth: The entries in the blue, which are subdiagonal entries, represent a transition to the next larger size class.
Division: The entries in the orange cells, \(j\) away from the diagonal, represent a division of cells so that biomass/counts transition from the \(j\)’th cell to the size class in the diagonal, representing an exact halving of the cell size. (The value of \(j\) is determined by the diameters in the size class)
Stasis: The diagonal entries. The rest of the cells remain at their size class.
Using this evolution mechanism, the size distribution at the next time is defined as:
\[ w\left(t+\Delta_{t}\right)=\frac{A\left(t, \Delta_{t}\right) w(t)}{\left\|A\left(t, \Delta_{t}\right) w(t)\right\|_{1}},\]
which is the normalized size distribution after having evolved by a left multiplication by \(A(t, \Delta)\).
Driven directly by the sunlight.
\[\gamma\left(t, \Delta_{t}\right)=\Delta_{t} \gamma_{\max }\left(1-\exp \left(-\frac{E(t)}{E^{*}}\right)\right)\]
TODO: be more detailed about this e.g. define the variables, explain this effect in words.
TODO: will growth be size dependent? Ask the group’s opinion.
Possibly size-dependent, parametric or non-parametric. Splines are one way to go.
One major missing component (in the model described thus far) could be cell respiration, another major known cell activity that directly affects the cell size distribution. There were several ideas
Basically, two days’ worth of 2-hourly size distribution data collected in a controlled lab environment.
TODO: describe more rigorously here.
##' Helper to open Zinser data.
##' @param filename NC file filename, like
##' \code{"data/Zinser_SizeDist_calibrated-52-12.nc"}.
##' @return List containing data \code{obs} (T x m matrix), \code{time} (T
##' length list of minutes passed since the beginning), \code{num_sizeclass}
##' (Number of size classes, or m), and \code{v} (left-hand-side border of
##' size classes).
open_zinser <- function(filename = "Zinser_SizeDist_calibrated-52-12.nc",
datadir = "data"){
## Read data.
## "data/Zinser_SizeDist_calibrated-52-12.nc"
nc <- nc_open(file.path(datadir, filename))
PAR <- ncvar_get(nc,'PAR') ## Sunlight
w_obs <- ncvar_get(nc,'w_obs') ## Data
## List variables in the nc file.
names(nc$var)
## What are these?
m <- ncvar_get(nc,'m') ## Number of size classes
## Time (in number of minutes since the beginning)
time <- ncvar_get(nc,'time')
## Size classes
delta_v_inv <- ncvar_get(nc,'delta_v_inv')
v_min <- ncvar_get(nc,'v_min')
delta_v <- 1/delta_v_inv
v <- v_min * 2^(((1:m) - 1) * delta_v)
## Name of w_obs
rownames(w_obs) = time
colnames(w_obs) = v
## Return
return(list(obs = w_obs,
time = time,
num_sizeclass = m,
v = v,
nc = nc,
PAR = PAR))
}
We can see that the growth and division is in concordance with the sunlight variable par, over the two day period of the data.
The bottom plot is another version of the data at a different number of size classes. There are half as many size categories (26 of them), and the same number of (\(25\)) time points.
## Retrieve data
for(filename in c("Zinser_SizeDist_calibrated-52-12.nc",
"Zinser_SizeDist_calibrated-26-6.nc")){
## Load data
dat = open_zinser(filename)
## Plot *calibrated* Zinser data.
ncat = ncol(dat$obs) ## alternatively, ncat = dat$num_sizeclass
cols = colorRamps::blue2red(ncat)
ylim = range(dat$obs)
matplot(dat$obs, col = cols, lwd = seq(from = 0.1, to = 5, length = ncat),
lty = 1, type = 'l', ylab = "", xlab = "")
title(main = paste0("Zinser data, ", ncat, " size classes"))
## Making i_test
d = 3
i_test = rep(0, length(dat$time))
i_test[seq(from=d, to=length(dat$time), by=d)] = 1
## Add grey regions for the time points that we're going to leave out.
cex = 1
abline(v = which(i_test == 1), col='grey', lwd=3, lty=1)
for(ii in which(i_test == 1)){
points(x=rep(ii, dat$num_sizeclass), y=dat$obs[ii,], pch=16, col="grey", cex=cex)
points(x=rep(ii, dat$num_sizeclass), y=dat$obs[ii,], pch=16, col="white", cex=cex * 0.7)
points(x=rep(ii, dat$num_sizeclass), y=dat$obs[ii,], pch=1, col="grey", cex=cex)
}
## Add sunlight
scpar = scale(dat$PAR)
scpar = scpar - min(scpar)
scpar = scpar / max(scpar) * max(dat$obs)
lines(scpar, lwd = 5)
## Add legend
legend("topright", lwd=3, col='grey', pch=1, legend="Test data")
}
Collected from the ocean, following a parcel of water. Much more fine grained in time (TODO maybe we can utilize this?).
TODO: describe more rigorously here.
We can make a visualization of the seaflow data, from the file data/SeaFlow_SizeDist_regrid-15-5.nc in the github repository https://github.com/fribalet/Bayesian-matrixmodel .
## Setup
ndays <- 4.0
limit_to_numdays <- 4.0
stride_t_obs <- 10
data <- list()
data$dt <- 10
source(file.path('data_processing.r'))
## Visualize the data
ncat = nrow(data$obs)
cols = colorRamps::blue2red(ncat)
ylim = range(data$obs)
## Make two plots:
par(mfrow=c(2,1))
par = windowsmooth(data$PAR, ncol(data$obs))
## Smaller sizes
iis=1:7
matplot(t(data$obs), type='l', lwd=2, lty=1, ylim=ylim, col='grey80',
ylab="")
lines(par/max(par) * 0.2, lwd=3, lty=3)
cutoff = quantile(par/max(par), 0.2)
lines(pmin(par/max(par) * 0.2, cutoff), lwd=3, lty=1)
matlines(t(data$obs)[,iis], type='l', lwd=2, col=cols[iis], lty=1, ylim=ylim)
title(main="Smaller sizes")
## Larger sizes
iis=8:15
matplot(t(data$obs), type='l', lwd=2, col='grey80', lty=1, ylim=ylim,
main="Larger sizes", ylab="")
lines(par/max(par) * 0.2, lwd=3, lty=3)
cutoff = quantile(par/max(par), 0.2)
lines(pmin(par/max(par) * 0.2, cutoff), lwd=5, lty=1)
matlines(t(data$obs)[,iis], type='l', lwd=2, col=cols[iis], lty=1, ylim=ylim)
We can see that the
Small size distributions negatively correlate with the large size distributions,
Their diel cycles are clearly visible,
The small size distributions increase over night (because division is prominent), and
The large size distributions increase with a slight lag to the sunlight cycle.